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Abstract — The low-rank matrix completion problem can be 
succinctly stated as follows: given a subset of the entries of a 
matrix, find a low-rank matrix consistent with the observations. 
While several low-complexity algorithms for matrix completion 
have been proposed so far, it remains an open problem to devise 
search procedures with provable performance guarantees for a 
broad class of matrix models. The standard approach to the 
problem, which involves the minimization of an objective function 
defined using the Frobenius metric, has inherent difficulties: the 
objective function is not continuous and the solution set is not 

] closed. To address this problem, we consider an optimization 
procedure that searches for a column (or row) space that 

' is geometrically consistent with the partial observations. The 
geometric objective function is continuous everywhere and the 
solution set is the closure of the solution set of the Frobenius 
metric. We also preclude the existence of local minimizers, 
and hence establish strong performance guarantees, for special 
completion scenarios, which do not require matrix incoherence 
or large matrix size. 

I I. Introduction 

I In many practical applications of data acquisition, the sig- 
■ nals of interest have a sparse representation in some basis. 
That is, they can be well approximated using only a few basis 
elements. This allows for efficient sampling and reconstruction 
of signals IT],©,!!!,!!!,!^),©. More precisely, the number 
of linear measurements required to capture a sparse signal 
can be much smaller than the number of inherent dimensions 
i of the signal, and various polynomial time algorithms are 
' known for accurately reconstructing the sparse signal based 
on these linear measurements. Due to the significant reduction 
in sampling resources and modest requirements for compu- 
tational resources, sparse signal processing has been studied 
intensively Q], 0, E], E), HI, 0. 

There are two categories of sparse signals which frequently 
arise in applications. In the first category, the sparse signal can 
be modeled a vector with only a small fraction of non-zero 
entries. Compressive sensing is the framework of sampling and 
recovering such signals. In the second category, the signals are 
represented by matrices whose ranks are much smaller than 
either of their dimensions. In the second setting, one of the 
fundamental problems of sparse signal processing is the low- 
rank matrix completion problem - to determine when and how 
one can recover a low-rank matrix based on only a subset of 
its entries 0, O, ||3- 

Scores of methods and algorithms have been proposed for 
low-rank matrix completion. Many of them are based on sim- 



ilarities between compressive sensing reconstruction and low- 
rank matrix completion. In general, both reconstruction tasks 
are ill-posed and computationally intractable. Nevertheless, 
exact recovery in an efficient manner is possible for both signal 
categories provided that the signal is sufficiently sparse or suf- 
ficiently densely sampled. Casting the sparse signal recovery 
problem as an optimization problem, £i -minimization has been 
proposed for compressive sensing signal reconstruction [l], 
II2I, O. Following the same idea, methods based on nuclear 
norm minimization have been developed for low-rank matrix 
completion [51, Q, ISl, |[9|. In terms of greedy algorithms, 
many of the approaches for low-rank completion can be 
viewed as generalizations of their counterparts for compressive 
sensing reconstruction. In particular, the ADMiRA algorithm 
ifTOl is a counterpart of the subspace pursuit (SP) ifTTl and 
CoSaMP pT^I algorithms, while the singular value projection 
(SVP) method [13 1 extends the iterative hard thresholding 
(IHT) |[T4l approach. There are also other approaches that 
utilize some special structural properties of the low-rank 
matrices. Examples include the power factorization algorithm 
IfTBi . the OptSpace algorithm IIT6I . and the subspace evolution 
and transfer algorithm |T7l|. 

Nevertheless, there is a fundamental problem in low-rank 
matrix completion which has not been successfully addressed 
yet: how to search for a low-rank matrix consistent with 
partial observations. The fundamental difference between com- 
pressive sensing and low-rank matrix completion lies in the 
knowledge of the "sparse basis". In compressive sensing, the 
basis under which the signal is sparse is known a priori. In 
principle, the support set of the nonzero entries can be found 
by exhaustive search. However, in low-rank matrix completion, 
the corresponding "sparse basis" is not known. Note that the 
set of all possible bases forms a continuous space. In such a 
space, "exhaustive" search is impossible. Moreover, we shall 
show, in Example 1 of Section |III] that a direct gradient- 
descent search does not work either 

The understanding of the search for consistent matrices 
is incomplete. There are two special cases where specially 
designed algorithms can guarantee a consistent low-rank so- 
lution. The first case is when the low-rank matrix is fully 
sampled. The consistent low-rank solution is simply the obser- 
vation matrix itself. The corresponding "sparse basis" (singular 
vectors) can be easily obtained by a singular value decompo- 
sition. The other case is when the rank equals to one. Given 
an arbitrary sampling pattern, one simply looks at the ratios 
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between the revealed entries in the same column and uses 
these ratios to construct a column vector that represents the 
column space. This method is guaranteed to find a consistent 
solution for rank-one matrices. However, it remains an open 
problem how to extend this method for general ranks. Hence, 
such an approach is not universal. On the other hand, none of 
existing general algorithms provides performance guarantee 
even for the rank-one case. The performance guarantee of 
nuclear norm minimization is built on incoherence conditions, 
which only holds with high probability when the low-rank 
matrix is drawn randomly from certain ensembles and when 
the size of the matrix is sufficiently large. Our understanding 
of low-rank matrix completion is far from complete. 

Our approach to address these issues is summarized as 
follows. 

1) We provide a framework for searching for a low -rank 
matrix that is consistent with the partial observations. 
There is no requirement that such a matrix is unique: if 
there is a unique low-rank solution, we should be able 
to find this unique matrix; otherwise, it suffices to find 
just one solution that agrees with the revealed entries. In 
our approach, we assume that the rank of the underlying 
low-rank matrix is known a priori. Finding a consistent 
low-rank matrix is equivalent to finding a consistent 
column/row space. This is different from the OptSpace 
algorithm in fTSl, where the search is performed on both 
column and row spaces simultaneously. 

2) We propose a geometric performance metric to measure 
the consistency between the estimated column space and 
the partial observations. In the literature, the standard 
approach is to minimize an objective function that is 
defined via the Frobenius norm. As we shall illustrate 
with explicit examples, this objective function may have 
singularities, and therefore the corresponding solution 
set may not be closed. Hence, we introduce a new for- 
mulation where consistency is now defined in geometric 
terms. This allows us to address the difficulties related 
to the Frobenius metric. In particular, we show that 
our geometric objective function is always continuous. 
The set of the corresponding consistent solutions is the 
closure of the set corresponding to the Frobenius norm. 
This new metric allows for provably strong performance 
guarantees, described below. 

3) We provide strong performance guarantees for special 
completion scenarios: rank-one matrices with arbitrary 
sampling patterns, and fully sampled matrice|3 of arbi- 
trary rank. For these two scenarios, a gradient descent 
search starting from a random point will converge to a 
global minimum with probability one. More importantly, 
if the partial observations admit a unique consistent 
solution, this search procedure finds this unique solution 
with probability one. The performance guarantees are 
different from those previously established in litera- 
ture. Roughly speaking, previous performance guaran- 

'For full sampled matiices, even though using a simple singulai' value 
decomposition produces a consistent column space, it is not clear that a 
randomlly initialized search would converge to a consistent column space. 
In what follows, we prove that this is the case. 



tees require large matrix sizes and only hold with high 
probability. Ours hold with probability one regardless 
of matrix size. It is also worth noting that we do not 
require incoherence conditions, which are essential for 
the performance guarantees of nuclear norm minimiza- 
tion. Unfortunately, we are presently unable to obtain 
performance guarantees for more general cases. 

The paper is organized as follows. In Section |ll] we in- 
troduce the low-rank matrix completion problem, and some 
background material regarding Grassmann manifolds and their 
geometry. In Section |III] we show that formulating the low- 
rank matrix completion problem as an optimization problem 
using the Frobenius norm may yield singularities which can 
obstruct standard minimization algorithms. We then propose 
a new geometric formulation of the problem as a remedy 
to this difficulty. This new formulation allows for strong 
performance guarantees that are presented in Section |IV] 
Section [V] summarizes the main contributions of the work. 
Proofs of the main results are presented in the Appendices. 

II. Low-Rank Matrix Completion and 
Preliminaries 

Let X E jjnixn unknown matrix with rank r < 

min (to, n), and let C [m] x [n] be the set of indices of 
the observed entries, where [K] = {1, 2, • • • , K}. Define the 
projection operator Vn by 

The consistent matrix completion problem is to find one rank-r 
matrix X' that is consistent with the observations Xq, i.e., 

(PO) : find a X' such that 

rank {X') = r and Vn {X') = Vn {X) = Xn- (1) 

By definition, this problem is well defined since Xn is 
obtained from some rank-r matrix X which is therefore a 
solution. As in other works, ifTOl . ifTSll . lfT6l . we assume that 
the rank r is given. In practice, one may try to sequentially 
guess a rank bound until a satisfactory solution has been found. 
We also introduce the (standard) projection operator V, 

V : R" X M™^*^ -> 

V{x,U)^y = UU^x, 

where 1 < k < m, and where the superscript f denotes the 
pseudoinverse of a matrix. Let span (U) denote the subspace 
spanned by the columns of the matrix U, i.e., 

span (U) = {v : v = Uw for some w e R'"} . 

One can describe V {x,U), in geometric terms, as the pro- 
jection of the vector x onto span ((7). It should be observed 
that U''x is the global minimizer of the quadratic optimization 
problem min^gRfc ||a; — CioHj ■ 
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A. Search for a consistent column space 

We now show that the problem (PO) is equivalent to finding 
a column space consistent with the observed entries of X. 

Let Um,r be the set of m x r matrices with r orthonormal 
columns, i.e., Um^r ^ {U e M™=''' : V^U = Ir} ■ Define 
the function fp : Um^r — ^ M by setting 

fF{U) = 



mm 



,2 

If ■ 



(2) 



where denotes the Frobenius norm. This function mea- 
sures the consistency between the matrix U and the obser- 
vations Xi^i. In particular, if fp [U] = 0, then there exists 
a matrix W such that the rank-r matrix UW^ satisfies 
Vn (UW^^ — Xq. Hence, the consistent matrix completion 
problem is equivalent to 



subspaces lfT9l . Il20l . Consider the subspaces span (i7) and 
span {V) of for some U E Um.p and V G U7n,q- The 
principal angles between these two subspaces can be defined in 
the following constructive manner. Without loss of generality, 
assume that 1 < p < q < rn. Let Ui G span (U) and Vi G 
span (V) be unit-length vectors such that is maximal. 

Inductively, let Uk G span (U) and Vk G span (V) be unit 
vectors such that u[uj ~ and v'^Vj ~ for all 1 < j < fc 
and I ujvk | is maximal. The principal angles are then defined 
as 

ak = arccosit^Vfc 

for k = 1,2, ■ ■ ■ ,p. 

Alternatively, the principal angles can be computed via 
singular value decomposition. Consider the singular value 
decomposition UU^VV'^ = UAV'^, where U G Um,p and 



(PI) : find U G U^.r such that fp {U) = 0. (3) V G Um,p contain the first p left and right singular vectors. 



In fact, fp{U) depends only on the subspace span (U) since 
the columns of a matrix of the form UW'^ all he in span ([/). 
Hence, to solve the consistent matrix completion problem, it 
suffices to find a column space that is consistent with the 
observed entries. Note that the same conclusion holds for the 
row space as well. For simplicity, we restrict our attention to 
the column space only. 

B. Grassmann Manifolds 

The set of column spaces of elements in Um.r can be 
identified with the Grassmann manifold Gm,r^ the set of r- 
dimensional subspaces in the m-dimensional Euclidean space 
M™. This is a smooth compact manifold of dimension r(m — 
r). Conversely, every element, say G Gm,7- can be presented 
by a generator matrix U G Um,r satisfying span({7) = 
However, this presentation of by a generator matrix is 
clearly not unique. Nevertheless, it follows from the discussion 
in the previous section that the function fp descends to a 
function on Gm.r- Thus, problem (PI) can be viewed as an 
optimization problem on the compact manifold Qm.r- 

In this section we recall some facts concerning the geometry 
of Grassmann manifolds which will be useful in addressing 
this and similar optimization problems. For the proofs of these 
facts the reader is referred to IfTSl . We begin by recalling 
the construction of the standard Riemannian metric, gm,r, on 
Qra,r- Notc that the group U„i^m of orthogonal mxm matrices 
acts transitively on Gm.r (by multiplication on generator 
matrices). More precisely, Gm.r can be described as a quotient 
of Um,m, i.e., 

Gm,r ^m.m / (l^m — 'r,m — 'r ^ ^r.r) 

Now, as a compact Lie group, Um.m has a standard (bi- 
invariant) Riemannian metric (can be defined by using inner 
product in the tangent space). This descends to the quotient 
Gm,r as the metric gm,r- By construction, g„j is invariant 
under the action of Um,m- 

The metric gm,r determines a chordal distance function and 
geodesic curves on Gm,r which will play an important role in 
what follows. To obtain the relevant formulas for these objects 
we require the notion of the principal angles between two 



3Xp 



is a diagonal matrix comprised 



> \p. Then the fc*'' columns of 



respectively, and A G I 
of singular values Ai > 
IJ and V correspond to the vectors and Vk used in the 
constructive definition, respectively. The fc*'' singular value 
defines the fc*'' principal angle via 

cosafe = Afe. 

Chordal distance on Gm. r> For U\ and U2 in lAm n the 

chordal distance between the two subspaces span(J7i) and 
span {U2) in Gm,r is given, in terms of the p principal angles 
between them, via the formula 



\ fc=i 



sin^ ak ■ 



The chordal distance can also be expressed in terms of singular 
values as 



\ k=l 



Geodesies on Gm.r- We will use the gradient descent method 
on Gm,r to search for consistent column spaces. This will 
require some information concerning the geodesies of the 
metric g^.r on Gm,r which we now recall. 

Roughly speaking, a geodesic in a manifold is a general- 
ization of the notion of a straight line in the Euclidean space: 
given any two points in G7n,r, among all curves that connect 
these two points, the one of the shortest length is geodesic. 
More precisely, fix a subspace in Gm,r and a tangent vector 
to Gm.r at . Let U G Um,r bc a generator matrix 
for The tangent space to Gm,r at can be identified 
with the set of horizontal tangent vectors to U, i.e., the set 
of tangent vectors W at U which satisfy U^W = |18|. 
Let H G K™^'' be the horizontal tangent vector at U which 
corresponds to ^ and set 

cos {SHt) 



U{t) - [UVh.Uh] 



sin (Snt) 



V, 



H ■ 



(4) 



where UhShVh compact singular value decomposition 
of H. Then span {U (t)) is the unique geodesic of gm,r which 
starts at with "initial velocity" J^. 
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We now use this general solution for the geodesic flow 
of gm.r to establish the following technical result concerning 
geodesies between a given pair of subspaces. 

Lemma 1: Fix two elements Ui and U2 of Um^r- Let 
'V1AV2 be the singular value decomposition of the matrix 
U1U2, and denote the i*^ singular value by = cos ai. Set 
Ui = UiVi and U2 = U2V2 and note that C7f [/2 = A. 

1) Consider the path 



Hence, 



U{t) = [Ui,G] 



diag([- 
diag([- 



■]) 



where the columns of G 
defined as follows 

E/2,:.-A.f7l.:, 



, COS ait, ■ 
, sin ait, • • • J 

[■■■ ,9i,---] e 



(5) 



C/2,:,-A,C/i 







if A,; ^ 1, 

if A, 1. 



Here, the subscript -i denotes the column of the 
corresponding matrix. Then the path span {U{t)) is a 
geodesic of g^.r such that span(i7(0)) = span(L'"i) 
and span {U (1)) = span {U2)- 
2) Let X G span(J72) be a unit-norm vector It's clear that 
there exists a unique u) € Ur.i such that x = Lf2'w. 
Suppose that x ^ span (Ui)- Let k the number of the 
singular values of U^U2 that equal to one. Then k < r 
and there exists an index j <E [r] such that k < j < r 
and Wj ^ 0. 

Proof: Clearly, U (0) = Ui. Since f/f C/2 = A, we have 

1 1 - - 1 1 2 

= l-2K{U2,^,Ul,,)+X^, 

= l-Xl 



Thus, we have 



, Ui^;i cosai + Qi sinaj, ■ 



= [••• ,t7l,:.A.+g. ||t72,:.-A,t7l,:,||,---]yi^ 

= (t7iA+ (t/s-C/iA))^!^ 

= U2V2V,^. 

Hence, span ([/(!)) = span (C/2). To prove the first part of the 
lemma it just remains to show that span(L''(t)) is geodesic. 
Setting = C/ (0) we have 

a^,■■■])V,^. 



H = Gdiag([ 



(6) 



We first verify that the tangent vector H is horizontal which 
is equivalent to showing that Uj" H = 0. According to the 
definition of the vectors Qi, when A, 7^ 1, one has 



|t/2,:.-A,t7l,JU0 



and 



uig^ 



1 

t/2,:i ^ \Ul -i 



rXiBi - X,e,. = 0. 



U'i'G = V^U'^G = 



By (|6l), this implies that H = 0, as desired. Note that 
equation (|6]l can also be viewed as an expression for the 
compact singular value decomposition of H . It then follows 
directly from (HI that span(C/(t)) is indeed a geodesic. 

To prove the second part of the lemma, let • • • ,Mi,r 
and 1x2,1, • • • , ^2,^ be the column vectors of the matrix Ui 
and C/2, respectively. By assumption, Ai = • • • = A/c = 1 and 
1 > Xk+i > • • • > Ar. Hence, 

iti,j ~ U2J, for all j < k, and 
{ui_j,U2j) = Xj < I, for all k < j < r. 

Suppose that k = r. Then 

X = U2W — U2W e span (C7i) , 

which contradicts the assumption that x ^ span (C/i). Hence, 
we have k < r. Now suppose that iBk+i — ■ ■ ■ — Wr — 0- 
Then 

k k 

X — ^ U2.jWj — ^ UijWj e span (Ui) , 

which again contradicts the assumption that x ^ span(C/i). 
Hence, there exists a j such that k < j < r and wj 7^ 0. This 
completes the proof. ■ 

An invariant measure on Gm,r- The space Um,m admits 
a standard invariant measure (the Haar measure) |21|. This 
descends to a measure on G7n,r which is also invariant in 
the following sense: for any measurable set Ai C Gm.r and 
any A £ Um.m, one has fJ,{M) = ^(AM), where AM — 
{span{AU) : UeU^.r, span(C/)eX} |21|, [20|. This 
invariant measure defines the uniform/isotropic distribution on 
the Grassmann manifold. Furthermore, let span (C/) G Gm,r 
be fixed and span (V) G Gm,r be drawn randomly from the 
isotropic distribution. The joint probability density function 
of the principal angles between the spans of U and V is 
explicitly given in II2TI . Il22l . Il20l . ||23 1. Two properties of this 
density function will be relevant to our later analysis: first, it 
is independent of the choice of U; second, there is no mass 
point. 

III. From the Frobenius Norm to the Geometric 
Metric 

In the previous section, we showed that the matrix comple- 
tion problem reduces to a search for a consistent column space. 
In other words, one only needs to find a global minimum of 
the objective function fp (U) , where 

fpiU)^ min \\Xn-rn{UW)\\l. (7) 



However, as we shall show in Section IIII-AI this approach 
has a serious drawback: the objective function (|7]i is not 
a continuous function of the variable U. The discontinuity 
of the objective function is due to the composition of the 
Frobenius norm with the projection operator Vn- It may 
prevent gradient-descent-based algorithms from converging to 
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a global optimum (see IfTTl for a detailed example). To address 
this issue, we propose another objective function fa (U) based 
on the geometry of the problem, detailed in Section Illl-BI To 
solve the matrix completion problem, one then needs to solve 
the problem 

(F2) : find a (7 e U,n,r such that fa (U) = 0. (8) 

where fa denotes the geometric metric, which is formally 
defined in Section UlI-BI 

In the rest of this section, we shall show that the new 
objective function fc is a continuous function. Furthermore, 
we shall show that the preimage of fa (U) = is the 
closure of the preimage of fp [U] — 0. Because of these nice 
properties of the geometric objective function, one can derive 
strong performance guarantees for gradient descent methods, 
as described in Section |IV] 

A. Why the Frobenius Norm Fails 

We use an example to show that the objective function ^ 
based on the Frobenius norm is not continuous. Let x^.i be 
the i*^ column of the matrix Xq. Let Hi C [ni] be the set 
of indices of known entries in the i*^ column. We use Vn.i 
to denote the projection operator corresponding to the index 
set of Hi. By additivity of the squared Frobenius norm, the 
objective function can be written as a sum of atomic functions, 
i.e., 

|2 



fF{U) 



mm 



E 



mm 



Vn{UW)rp 



Denote the i*'' atomic function by fp.i [U). It can be verified 
that 



If. {U) 



mm \ Xq . 



If 



= ^Xiis - V (a;o,i, [^o, )||^ , 

where Uii^ = \Pii.,i (^i) , ■ • • , Tii.i ('"r)] and tti, • • • , tt^ are 
column vectors of the matrix U . We show in the next example 
that an atomic function, say ]p,\ {U), may not be continuous. 

Example 1: Suppose that cco.i = [0, 1, 1]"^ and fii = {2, 3}. 
Let U be of the form U = [Vl - 26^, e, e] e Ui^i where e e 
[—1/ -\/2, 1 /y/2\ . For a given U, the atomic function fp^i {U) 
is given by 

/F,i(J7)=min -Vn.iiJJw) 

This is a quadratic optimization problem and can be easily 
solved. The optimal w* is given by 



w 



if e ^ 0, 
if e 0. 



Hence, one has 

fF,iiU{e)) 




fF.l = 1 





"3 






^F^ = 1 








= 

















"3 












e = o 













Figure 1. Contours projected to the (m2,M3) plane. The left depicts the 
contours of the squared Frobenius norm. The right corresponds to the chordal 
distance. 



which shows that /^.i {U (e)) has a singular point at e = 0. 

It is straightforward to verify that the overall objective 
function dTji is also a discontinuous function of U. As we 
argued in [ITJ, this discontinuity creates so called barriers, 
which may prevent gradient-descent algorithms from converg- 
ing to a global minimum. Hence, one seeks an optimization 
criteria that will allow for a continuous objective function and 
consequently, no search path barriers. 

B. A Geometric Metric 

To address the problem due to the singularities of the 
objective functions, we propose to replace the Frobenius norm 
by a geometric performance metric. 

In this case, the objective function is defined as 



/G(i7)=X^/G,.((7) 



i=l 

where fc,i (U) denotes the geometric metric corresponding 
to the i*'' column, defined as follows. If xq ^ = 0, we set 
fa [U) — 0. Henceforth, we only consider the case when 
xn,i ^ 0. For any xn,i ^ 0, let xn^i = xn^i/ Wxn^^Wp be 
the normalized vector x^^i. Let 0,1 = {1,2, • • • ,nn]\Q,i be 
the complement of Hi. Let G M™ be the fc*'' natural basis 
vector, i.e., the k*-^ entry of equals to one and all other 
entries are zero. Define 



Bi ["^ri-i: ^/ci : * * * 1 ^ki\ : 



(9) 



where {fci,--- ,ki} = $7^. Let Amax (-B^L/") be the largest 
singular value of the matrix BfU. Then 



/g,, {U) = 1- A^a, (BfU) . 



(10) 



This expression is closely related to the chordal distance 
between two subspaces, as described in Section III-BI We 
henceforth refer to the function (fTOl l either as the geometric 
metric ( fTOl i, or with slight abuse of terminology, as the chordal 
distance. 

One advantage of the chordal distance is its continuity. This 
follows directly from the continuity of the singular values 
of the underlying matrix. Recall Example L In Fig. [1] we 
illustrate the differences between fp^i and fc,i by projecting 
their contours of constant value onto the M2-U3 plane. 
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More importantly, the following theorem shows that the 
preimage of /g ^ (U) = is actually the closure of the 
preimage of fp.i {U) = 0. 

Theorem 1: Given ajji.i G M™ and Vlt C [m]. Let Ujii G 
be such that {UnX,t = ^^fe,^ if ^ ^ a, and (C/o.)^^^ - 
if fc ^ fii. Define 

Z^F,^ = {[/ e W™,, : fF,^ [U] = ||a;n,» - V (a;o,^, C/oJ||' = o} 
and 

UG,^ = {U ^ U,-a^r : /G,^ (C/) = 1 - A„,ax (^f [/) = O} . 

Then Uci is the closure of Up.i^ i-e., Uga — l^F,i- 

The proof is given in Appendix |A] Although this theorem 
deals with only one column of the observed matrix, the result 
can be easily extended to the whole matrix X^: let Up = 
r\i=i^F,i and 

n 

1=1 

= {Ue Um^r : A„,ax {U^ B,) - 1 for all i} ; (1 1) 

then Ug — Up- 

Example 1 ( Continued): It can be seen that 



and let bi and Vi be the corresponding left and right singular 
vectors, respectivel}|l. Following the definition of the chordal 
distance, one has /g,j (U) = sin^ 6*, = 1-A|. Let e M"''"' 
be a matrix such that 



iG^) 



1 







2 

max 



Hence, 

/g4 ([/) = !- a; 

As a result, 

[v/l-2e2,e,e' 



0. 



T 1 

: < - and e =^ 

- 2 ^ 



^ : < i and e ^ , 



and 



Ug,i = |[v/l-2e2,e, 



Clearly, Z-/g,i — Uf,i- 

C. Computations Related to the Chordal Distance 

For a given performance metric, the computational complex- 
ity of the supporting optimization procedure is an important 
factor for assessing its practical value. In this subsection, 
we show that besides its continuity, the chordal distance and 
the related gradient can be computed efficiently. Hence, all 
the algorithmic solutions using gradient descent methods can 
be easily modified to accommodate the geometric distortion 
measure. 

The principal angle 9i and the chordal distance sin^ 9i can 
be computed using the singular value decomposition. Given 
the i*'' column of the observed matrix, one can form Bi easily. 
Let \i be the largest singular value of the matrix BiBfU, 



dUk,i 
It can be verified that 



fci [U] = -2 cos 6*4- 



dUt 



k,e 



-2Xb,vf 



(12) 



Note that in the matrix completion problem, one only needs 
to search for a column space span (U) consistent with the 
observations. Taking this fact into consideration, we have ifTSl 



Vt//G = J2 ^ufGa ={I~ UU^) 



(13) 



4=1 



Switching from the Frobenius norm to the chordal distance 
does not introduce extra computational cost. Due to the 
particular structure of Bi, the matrix multiplication BiBjU 
can be executed in O [mr) steps. The resulting matrix has 
dimensions m x r, where typically r <^ m. The major 
computational burden is incurred by the singular value de- 
composition. Computing the largest singular value and the 
corresponding singular vectors of an m x r matrix essentially 
reduces to computing the largest eigenvalue of an r x r 
matrix and the corresponding eigenvector. Hence, the overall 
complexity of computing fG,i is O (mr^ + r^) — O (mr'^), 
where the O (mr^) and O (r^) terms come from matrix 
multiplication and eigenvalue computation, respectively. In 
comparison, to solve the least square problem in the definition 
of fF,i has a O {mr'^^ cost as well. 

IV. Performance Guarantees 

Consider the matrix completion problem described in ([8]). 
The following theorem describes completion scenarios for 
which a global optimum can be found with probability one. 

Theorem 2: Consider the following cases: 

1) (rank-one matrices with arbitrary sampling): Let Xfj = 
Vn {X) for some unknown matrix X with rank equal 
to one. Here, C [m] x [n] can be arbitrary. 

2) (full sampling with arbitrary rank matrices): Let Xfj = 
X, i.e., = [to] X [n]. 

Suppose that r = rank(X) is given. Let Uq C ^ be the 
preimage of /g (U) = (also defined in ([TT])). Let Uq be 
randomly generated from the isotropic distribution on Um,r, 
and used as the initial point of the search procedure. With 
probability one, there exists a continuous path C (t), t G [0,1], 
such that U{0) = Uq, (7(1) e Ug and ^/g < for all 
< e (0, 1), where the equality holds if and only if Uo G Ug- 

The proof of the theorem is outlined in Section IIV-AI It 
is worth to note that almost all starting points are good: it 
is certainly good if the starting point is a consistent solution; 

-For convenience, we use the following convention regarding the singular 
vectors bi and Vi : we let the first nonzero entry of u,; be positive; otherwise, 
we let v'^ = —Vi and b'^ = —bi, and use v'^ and b'^ for singular value 
decomposition. The simultaneous changes in signs do not affect the singular 
value decomposition nor the computation of the gradient. 
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Otherwise, there exists a continuous path from this starting 
point to a global optimum such that the objective function 
keeps decreasing. The performance guarantee provided in 
Theorem 12] is strong in the sense that it does not require either 
incoherence conditions or large matrix sizes. 

A simple corollay of the Theorem |2] is the following result: 
suppose that the partial observations Xq admit a unique 
consistent solution in terms of the Frobenius norm; then a 
gradient search procedure using the geometric norm finds this 
unique solution with probability one. This conclusion follows 
from the fact that the solution set under the Frobenius norm 
contains only a single point and therefore Uq ~ Up = Up- 

For the more general case where r > 1 and Q, ^ [to] x [n], 
we can not prove the same performance guarantees. Neverthe- 
less, in Section IIV-BI we present a collection of results that 
may be helpful for future exploration. 

A. Proof of Theorem |2] 

For our proof techniques, we need the following two as- 
sumptions. 

Assumption I: There exists a global optimum Ux G l^m,r 
such that fa {Ux) = and all the r principal angles between 
span(L'x) and span(L/o) are less than tt/2. That is, all the 
singular values of UxUq are strictly positive. 

Assumption II: All of the ^^'s (the smallest principal angle 
between span(J7o) and span(Bi)) are less than tt/2. 

Remark 1: Suppose that the matrix Uq is randomly drawn 
from the uniform (isotropic) distribution on U,n,r- Then Uq 
satisfies both assumptions with probability one. This result 
can be easily verified using the probability density function 
of the principal angles jlTI. lEI. lilOl. iBII. 

Assuming that these two assumptions are satisfied, we have 
the following two theorems corresponding to the two cases in 
Theorem [21 respectively. 

Theorem 3: (Rank-One Case) Let Xn be the partial obser- 
vation matrix generated from a rank-one matrix. Let Uq G 
Um.i be an estimate of the column space that satisfies As- 
sumptions I and II. Suppose that ^in^ 6i ^ 0. Then there 
exists a continuous path u (t) e Um.r such that u (0) = uq, 
u{l) e Ug, and ^|j^gSin^6'i < for all i e [n], where 
equality holds if and only if 6i (0) = 0. 

Theorem 4: (Full-Sampling Case) Let X E jjmxn ^ 
rank-r matrix. Let Uq E Um^,. satisfy Assumptions I and II. 
Suppose that Y^^=i ^i^i^ 0- Then there exists a [/ (t) G 
U„,^r such that U (0) = [/q, U (1) e Ug and ^ j^^^ sin^ 6, < 
for all i E [n], where equality holds if and only if 9i (0) = 0. 

The proofs of Theorem[3]and|4]are given in Appendix [Eland 
[Cl respectively. Since the proof techniques differ significantly, 
we present the two theorems/proofs separately. 

Both theorems are stated for derivatives taken at t = 0. Nev- 
ertheless, the analysis can be extended for arbitrary t E [0,1], 
that is, ^ sin^ Oi < for all t E [0,1], where the equality holds 
if and only if 6', (t) = 0. To show that this is the case, note that 
in proving both Theorem [3] and Theorem [H we constructed a 
continuous path U {t) such that U (0) = Uq and U (1) eUg- 
By fixing this continuous path, we observe that: 



1) All the r principal angles between span(C/o) and 
span [U (1)) are monotonically decreasing as t increases 
to one. This implies that Assumption I holds for all 

t e [0,1]. 

2) We have 6, (t) < 7r/2 for all i E [n] and for all t E [0, e) 
for some sufficiently small e > 0. This claim can be 
verified by invoking the facts that 6*,; (0) < 7r/2 for all 
i E [n] and that 6i is a continuous functions for all 
i E [n]. As a result, all U (i)'s, where t E [0, e), satisfy 
Assumptions I and II. 

3) For every t in the interval [0, e), U [t] is the starting 
point of the geodesic path from U (t) to U (1), which 
is a part of the geodesic path from U (0) to U [1). Using 
the same proof techniques as in Appendix [B] and [C] it 
is clear that ^ sin^ 9i [t] < for all t E [0, e). Hence, 
0, (t) < (0) < § for all i E [n] and for all t E [0, e). 

4) The arguments above can be extended. It can be verified 
that e, (t) < e, (0) < 7r/2 for all i E [n] and for all 
t E [0, 1]. This implies that U (t) satisfies Assumptions 
I and II for all t E [0, 1]. Hence, ^ sin^ 0, (t) < for 
all i E [n] and all t E [0,1], where the equality holds if 
and only if 9i (<) = 0. Theorem [2] therefore holds. 

A direct consequence of Theorem [2] is that for almost 
all Uq E Um.r, there exists a continuous path leading to 
a global minimizer. However, one does not know this path 
in the process of solving the matrix completion problem. A 
practical approach is to use a gradient descent method. We 
consider the following randomized gradient descent algorithm. 
Let C*'' E Um,r, i = 1, 2, • • • , be the starting point of the i*'' 
iteration. Clearly, (7'*-', i > 2, is also the end point of the 
{i — 1)*^ iteration. We generate the sequence of 's in the 
following manner 

1) Let ij'^^ be randomly generated from the isotropic 
distribution. 

2) Set i ~1. Execute the following iterative process. 

a) Compute the gradient Vu{i)fG- 

b) Let t/^*-' [t) be the geodesic curve starting at 
U^i) (0) = U^^ with direction H = -Vuw/g- 

c) Let t(''> be such that ^/g (t^''*) = and 
^/g {t) < for all t < 

d) Randomly generate a i^'^ from the uniform distri- 
bution on (0,i(^)*]. 

e) Let iJ^'+i^ = U'-') (t(*)). Let i = i + l.Go to Step 
(a). 

Due to the randomness of U^^\ all [/^^•''s satisfy Assumptions 
I and II with probability one. The objective function decreases 
after each iteration. This gradient descent procedure converges 
to a global minimum as the number of iterations approachs 
infinity. 

Remark 2: Denote the obtained global minimum by U. It 
may happen that U E Ug\Uf. In this case, the solution is 
inconsistent with respect to to the standard Frobenius norm. 
One can use perturbation techniques to move XJ from the 
boundary of Up to the interior region of Up- 



g 



B. The General Framework 

For the cases that are not described in Theorem |2l we have 
the following corollary. 

Corollary 1: (General Cases) Let X e M™^" be a rank- 
r matrix. Let Ux G Ug be a global minimum. For each 
i G [n], the following statements are true. Let ux,i G 
span {Ux) n span {Bi) be a unit norm vector. Let C/q S Z/^m,r 
and iiJi e Ur,i be randomly drawn from the corresponding 
isotropic distributions respectively. Then with probability one, 
the vector Mo,j = U^Wi is not orthogonal to ux,i- Suppose 
that this is true. Define 9i = cos^^ WV {ui [t) ,Bi)\\^. There 
exists a continuous path Ui (t) E Um.i such that Ui (0) = tto,i, 

Ui{l) e span (C/x,i) n^m4' ™d Ti^^^^i — where the 
equality holds if and only if 9i {t) = 0. 

Proof: Without loss of generality, we assume that 
{uq i,ux.i) > 0. The desired continuous path is given by 



Ui (t) 



(1 - t) Uq., + tUx., 



t £ [0, 1] 



Uo,i + tUx., 

The detailed arguments are the same as those in the proof of 
Theorem [3] and therefore omitted. ■ 

Remark 3: This corollary is similar to Theorems [3] and H] 
in the sense that there exist continuous paths along which the 
atomic functions decreases. 

At the same time. Corollary [T] differs from Theorems [3] 
and |4] in two aspects. First, the paths Ui (t) in Corollary 
[U may be different for different i's, while in Theorems [3] 
and m a single continuous path U (t) is constructed. Second, 
the angle di in Corollay [T] is essentially the principal angle 
between the 1 -dimensional subspace span (it; (i)) and the 
subspace span(Si). In contrast. Theorem [3] and |4] involve the 
minimum principal angle between the r-dimensional subspace 
span {U (t)) and the subspace span (Bi). 

V. Conclusion 

We considered the problem of how to search for a consistent 
completion of low-rank matrices. We showed that Frobenius 
norm combined with a projection operator results in a dis- 
continuous objective function and therefore makes gradient 
descent approach fail. We proposed to replace the Frobenius 
norm with the chordal distance. The chordal distance is the 
"best" smooth version of the Frobenius norm in the sense that 
the solution set of the former is the closure of the solution set 
of the latter Based on the chordal distance, we derived strong 
performance guarantees for two completion scenarios. The 
derived performance guarantees do not rely on incoherence 
conditions or large matrix sizes, and they hold with probability 



Appendix 

A. Proof of Theorem |7] 

We omit the subscript i to simplify notation. The proof 
consists of two parts, showing that: 

1) Uf c Ug; 

2) for any given Uq G Uq, there exists a sequence 



We start by proving that Up C Uq- For any given U G Up, 
there exists a nonzero vector to e M'' such that Uqw = x^. 



Let b = Uw/\\w\\. Clearly, ||6|| 



: 1. Recall the formula 
for Bx^^. We can write 6 as a linear combination of columns 
of B^ -. 



3=0 



-xn 



As a result. 



It follows that the largest singular value of B'^^JJ is one. 
Therefore, U € Uq, and we thus have Up C Ug- 

To prove the second part, we make use of the following 
notation. For any given Uq E Ug, let Mi, • • • ,it,- be the left 
singular vectors of the matrix UqUq B^a corresponding to the 
^t/i i^ggsi; singular value. Let k be the multiplicity of the sin- 
gular value one, i.e., the number of singular values that equal to 
one. Let Ui-.k = [ui, ■■■ , Uk] and Uk+i-.r = [uk+i, ■■■ , Ur]- 
Clearly, A„iax (i7j_^i,^Bccn) < 1- 

It suffices to focus on U instead of Uq. That is, to prove the 
second part, it suffices to find a sequence in Up converging 
to U. To verify this claim, let V = U'^Uq. Then V E Ur,r 
and J7o = UV . Suppose that {tJ^")} C Up IS a sequence 
such that J7(") -> U. It is clear that t/(")y ^ UV = Uo- 
Furthermore, since 

one has [/("^V E Up. The sequence {Lr(")F} C Up is the 
desired sequence that converges to Uq. It is also important to 
note that U E Ug, since 

A ([/o[/o^B,„) - A (UVV^U^B^,) = A (UU^B^,) . 

We claim that 



U E Up if and only if Ui;k,n 0. 
To prove this claim, we shall show that 

Ui,k,n ^O^UeUf 



and 



Uv.k.n ^Q^U iUp. 



(14) 



(15) 



(16) 



{(7(")} C Up such that lim„^oo \\Uq - U^"^ \ 



0. 



To prove dTsl l. suppose that Ui;k,n 7^ 0. Without loss of 
generality, let Mi.fj 7^ 0. Since 1x1 is the left singular vector 
corresponding to the singular value equal to one, Ui can be 
written as a linear combination of the columns of B^^/. Ui — 
aixn + J2jen'= "j^j- Since Ui^n = aiXn ^ 0, one has ai ^ 
0. As a result, ajji = aiti ^ for some constant a 7^ 0. Hence, 
dp {xq,U) = and (7 e Up. 

To prove ( fTSI l. assume that Ui;k,n = 0. Since 
V{xa,Un) = V {xn,Uk+i:rM), proving that U Up is 
equivalent to proving that xq — V {xq, Uk+i:r,n) 7^ 0. This 
inequality can be proved by contradiction. Suppose that we 
have an equality. Then there exists a vector w E W'^'^ 
such that Uk+i:r,nw = xq. Let b = Uk+i-.r^/ \\w\\. It is 
straightforward to show (using similar arguments as the ones 
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used for proving Up C Ug) that b £ span{Bxn) and the 
largest singular value of U'^j^-^.^Bx^^ is one. This contradicts 
the fact that A^ax {Ul^^.^B^,,) < 1. 

Now we are ready to construct a sequence in Up converging 
to U. If Ui;k.n 7^ 0, then U E Up and it is trivial to find a 
sequence in Up converging to U. It remains to find a sequence 
|[/(")| (2 Up that converges to U when Ui;k,n = 0. Define 
Xr = Xfi -V {xn, Un). Since Ui;k,n = 0, one has U ^ Up 
and Xr 7^ 0. Note that a;r o= = and that Xr.n -L Mi.ji for all 
i £ [r]. It can be verified that Xr -L iti, • • • , Xr -L Ur- Let 



Ui 



,U2, 



It can be verified that e Um.r- Furthermore, V {xn.Ua) = 
V{xn, [xr,Uk+i:r,n]) = and therefore Ue G Up for all 
e 7^ 0. Now choose a sequence {[/'^"■'} = {C/i/„}. It is a 
sequence in Up and it converges to U. This completes the 
proof. 

B. Proof of Theorem |5] 

Since is generated from a rank-one matrix, there exists a 
ttx G W?ri,i such that ux G span {Bi) for all i e [n]. Without 
loss of generality, we assume {u,ux) > 0: by Assumption I, 
{u,ux) 0; if {u,ux) < 0, we replace ux with ~ux- 

Now define 

(1 -t)uo+ tux (1 -t)uo + tUx 



u{t) = 



\{l-t) UQ+tUx\ 



Lit) 



where L (t) = ||(1 — t) tto + tux\\ ■ Clearly u (0) = Uq and 
u (t) £ Um,i in a neighborhood of t = 0. 
For every i G [n], we shall show that 



di 



sm 



dt 



1 2 
- COS 

2 



<0, 



(17) 



where the equality holds if and only if 9i = 0. Let ViU denote 
the vector V (u, Bi) = BiBju. Since ux G span [Bi), one 
has ^ 

r,u = ((1 - 1) r^uo + tux) . 



L{t) 



We then have 
d_ 

di 



t=o 
d 

" di 



- cos 
2 



d 
di 



t=0 



1 /l-t 

2 



iViUol 



t 



Lit) 



= (-1 - L' (0)) \\V,uof + {V,uo, ux) ■ 

Note that 

{ViUa,ux) = UxBiBfuf) = {un,ViUx) = (mo,mx) 
Consequently, 



di 



1 



cos^ 9, 



i-l - L' {0))\\V,uof + {uo,ux) . 

(18) 



The term L' (0) can be computed as follows. Note that 

L^ (i) = (1 - t)^ \\uof + <2 \\ux\\ +2{t- <2) {uo, ux) 



1 - 2t + 2r + 2 (i - t^) (tto, Mx) 



Therefore, 

d 

As a result. 



L^ (t) = -2 + 2 (mo, Ux) = 2L (0) L' (0) 



L'{0) = -l + {uo,ux). 
Substituting il9[ into (fTSI l one can see that 



(19) 



■ cos 



= (uo.ux) 1 - llT'jMO 



> 0, 



where the equality holds if and only if jjT'iMoll = 1, i.e., 
tto G span (Bi) and 9i = 0. This completes the proof. 

C. Proof of Theorem |4] 

Let Ux S Um^r be such that every column of X is 
in the subspace span(L'x)- Consider the compact singular 
decomposition U^U^UxUj^ = U^SU'^, where S eW'"'' 
is the diagonal matrix containing the singular values and 
Uq and U'x are the left and right singular vector matrices, 
respectively. Clearly, Uq and Uq generate the same subspace, 
and so do Ux and U'x- For simplicity, we present our 
proof for Uq and U'-^ and omit the superscripts. With this 
simplification, one has U^Ux = S = diag ([Ai, • • • , A^]). 

For the i*'' column of X, we compute V[7„cos6'i. Since 
we are considering the full sampling case, we have Bi ~ Xi. 
Because Xi £ span(i7x), there exists w £ Ur^i such that 
Xi = Uxw. To compute Vt/oCos^i, we need the first left 
and the first right singular vectors of the matrix XixfUo. The 
first left singular vector is clearly Xi and the first right singular 
vector equals Uffxi = U^Uxw = Sw. Hence, 



cos 9i 



{I 
(I 



UqU^ ) x.w' S' 



UaU^) Uxww^S^ 



According to Lemma [!](/ — UqUq) Ux can be written 
as Gdiag ([sinai, • • • ,sinQ;j]), where G — [gi,--- ,gr] G 



U„ 



and a, 



-^A,'s, 



1 , • • • ,r, are the principal 



angles between span(Uo) and span(L'x)- 

We consider the geodesic U (t) from Uq to Ux - In Lemma 
[T](part 1), we show that this geodesic is given by the U (t) 
satisfying U (0) = Uq and f/ (0) = Gdiag ([ai, •• • 
Along this path, we have 



d_ 

di 



t=o 



;os6'j = (Vf/o cos 9i, Gdiag ([ai, • 

= trace (^(Gdiag([ai, •• • ,ar]))^ 
{{I - UoU;^) Ux) ww'^ S^) 



>ar])) 



= trace (diag ( [ 
= trace((j-5'2) ww^ S) 

r 

= E 



])ww^S) 



Wj aj sin aj cos aj > 0. 



(20) 
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We claim that under Assumption II, equality in (|20] i holds if 
and only if 6i = 0. If 9i = 0, then Xi e span {Uq). According 



to Lemma [T] (part 2), 



for all j such that aj ^ 0. 



The equality in ( |20] | thus holds. Otherwise, if 9i ^ 0, then 
Xi ^ span(L'o). Again, according to Lemma [T] (part 2), there 
exists an j e [r] such that > and Wj 7^ 0. Hence, we 
have a strict inequality in ( |20l i. Finally, note that 



sm = — 



t=o 

This proves the theorem. 



dt 



cosf 



< 0. 



t=0 
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